Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning
نویسندگان
چکیده
We suggest a novel approach for the estimation of the posterior distribution of the weights of a neural network, using an online version of the variational Bayes method. Having a confidence measure of the weights allows to combat several shortcomings of neural networks, such as their parameter redundancy, and their notorious vulnerability to the change of input distribution (”catastrophic forgetting”). Specifically, We show that this approach helps alleviate the catastrophic forgetting phenomenon — even without the knowledge of when the tasks are been switched. Furthermore, it improves the robustness of the network to weight pruning — even without re-training. 1
منابع مشابه
On Quadratic Penalties in Elastic Weight Consolidation
Elastic weight consolidation [EWC, Kirkpatrick et al., 2017] is a novel algorithm designed to safeguard against catastrophic forgetting in neural networks. EWC can be seen as an approximation to Laplace propagation [Eskin et al., 2004], and this view is consistent with the motivation given by Kirkpatrick et al. [2017]. In this note, I present an extended derivation that covers the case when the...
متن کاملOvercoming catastrophic forgetting with hard attention to the task
Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks’ information without affecting the current task’s learning. ...
متن کاملLess-forgetting Learning in Deep Neural Networks
A catastrophic forgetting problem makes deep neural networks forget the previously learned information, when learning data collected in new environments, such as by different sensors or in different light conditions. This paper presents a new method for alleviating the catastrophic forgetting problem. Unlike previous research, our method does not use any information from the source domain. Surp...
متن کاملVariational Stochastic Gradient Descent
In Bayesian approach to probabilistic modeling of data we select a model for probabilities of data that depends on a continuous vector of parameters. For a given data set Bayesian theorem gives a probability distribution of the model parameters. Then the inference of outcomes and probabilities of new data could be found by averaging over the parameter distribution of the model, which is an intr...
متن کاملAlgorithmic improvements for variational inference
Variational methods for approximate inference in machine learning often adapt a parametric probability distribution to optimize a given objective function. This view is especially useful when applying variational Bayes (VB) to models outside the conjugate-exponential family. For them, variational Bayesian expectation maximization (VB EM) algorithms are not easily available, and gradient-based m...
متن کامل